Using a Permutation Test for Attribute Sele tion in De ision Trees

نویسندگان

  • Eibe Frank
  • Ian H. Witten
چکیده

Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provide a wellfounded basis for addressing the problem. However, many statistical tests are computed from a chi-squared distribution, which is only a valid approximation to the actual distribution in the large-sample case—and this patently does not hold near the leaves of a decision tree. An exception is the class of permutation tests. We describe how permutation tests can be applied to this problem. We choose one such test for further exploration, and give a novel two-stage method for applying it to select attributes in a decision tree. Results on practical datasets compare favorably with other methods that also adopt a pre-pruning strategy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Íòò¬¬¬ Ööññûóöö Óö Úðùùøøóò Ååøöö Blockin× Áò Ðð××׬ Blockin Blockinøøóò Í××òò ××óò Ìööö×

Abstra t. Most evaluation metri s in lassi ation are designed to reward lass uniformity in the example subsets indu ed by a feature (e.g., Information Gain). Other metri s are designed to reward dis rimination power in the ontext of feature sele tion as a means to ombat the feature-intera tion problem (e.g., Relief, Contextual Merit). We de ne a new framework that ombines the strengths of both ...

متن کامل

Rademacher Penalization over Decision Tree Prunings

De ision Tree Prunings Matti K aari ainen and Tapio Elomaa Department of Computer S ien e, University of Helsinki, Finland fmatti.kaariainen,elomaag s.helsinki.fi Abstra t. Radema her penalization is a modern te hnique for obtaining data-dependent bounds on the generalization error of lassi ers. It would appear to be limited to relatively simple hypothesis lasses beause of omputational omple...

متن کامل

Termination of Constraint Contextual Rewriting

Abstra t. The e e tive integration of de ision pro edures in formula simpli ation is a fundamental problem in me hani al veri ation. The main sour e of diÆ ulty o urs when the de ision pro edure is asked to solve goals ontaining symbols whi h are interpreted for the prover but uninterpreted for the de ision pro edure. To ope with the problem, Boyer & Moore proposed a te hnique, alled augmentati...

متن کامل

Öö Blockinøøóòòððý Ëôô Ïïøø Ëùùùùòò Óöööððøøóò

Stephan Weiss1, Markus Rupp2, and Lajos Hanzo1 1 Dept. Ele troni s & Computer S ien e, University of Southampton, UK 2 Wireless Resear h Lab / Bell-Labs, Lu ent Te hnologies, Holmdel, NJ, USA fsw1,lhg e s.soton.a .uk, rupp lu ent. om Abstra t In this paper we proposed a modi ation of the " lassi " fra tionally spa ed de ision feedba k equaliser (FSDFE) in order to in rease the slow onvergen e r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009